2023-12-24发表2024-01-12更新学习 / 数学 / Applied Numerical Algorithms10 分钟读完 (大约1517个字)0次访问

SVD

Understading the Geometry of $A\in \mathbb{R}^{m\times n}$

\[ A\in \mathbb{R}^{m\times n} \\ x\in \mathbb{R}^n \mapsto Ax \in \mathbb{R}^m \]

What we are formalizing is that there is a major axis and a minor axis to this ellipse and we are stretching in the major and minor axes and rotating the ellipse out of the coordinate plane.

Critical point of the ratio:

\[ R(\mathbf{v}) = \frac{||A\mathbf{v}||_2}{||\mathbf{v}||_2} \]

Since $R(\alpha \mathbf{v}) = R(\mathbf{v})$, we can assume $||\mathbf{v}||_2 = 1$.

\[ \max_{\mathbf{v}}||A\mathbf{v}||_2 \quad \text{subject to } ||\mathbf{v}||_2 = 1 \rightarrow A^TA\mathbf{v} = \lambda \mathbf{v}, \lambda \geq 0 \]

According to Spectral Theorem, $A^TA$ is symmetric and has $n$ real eigenvalues and $n$ orthogonal eigenvectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ which span $\mathbb{R}^n$.

Singular Value Decomposition

Define $u_i^* = A\mathbf{v}_i$.

\[ \begin{aligned} &A^TA\mathbf{v}_i = \lambda_i \mathbf{v}_i \\ &\Rightarrow A^Tu_i^* = \lambda_i \mathbf{v}_i \\ &\Rightarrow A A^Tu_i^* = \lambda_i A\mathbf{v}_i = \lambda_i u_i^* \\ \end{aligned} \]

So, $u_i^*$ is an eigenvector of $AA^T$ when it is nonzero.

\[ ||u_i^*||_2 = ||A\mathbf{v}_i||_2 = \sqrt{\mathbf{v}_i^T A^TA \mathbf{v}_i} = \sqrt{\mathbf{v}_i^T \lambda_i \mathbf{v}_i} = \sqrt{\lambda_i} \]

Two cases:

$\lambda_i \neq 0 \Rightarrow u_i^* = A\mathbf{v}_i$ is an eigenvector of $AA^T$ with $||u_i^*||_2 = \sqrt{\lambda_i}$
$\lambda_i = 0, u_i^* = 0$

Take $k:= \text{\# nonzero eigenvalues of } A^TA$.

\[ \exists \mathbf{v}_1, \dots, \mathbf{v}_k \in \mathbb{R}^n, u_1, \dots, u_k \in \mathbb{R}^m \text{ with: }\\ \begin{aligned} A^TA\mathbf{v}_i &= \lambda_i \mathbf{v}_i\\ AA^Tu_i &= \lambda_i u_i(u_i = u_i^* / ||u_i^*||_2)\\ A\mathbf{v}_i &= \sqrt{\lambda_i} u_i\\ A^Tu_i &= \sqrt{\lambda_i} \mathbf{v}_i\\ \end{aligned}\\ \]

Observation

\[ u_i^TA\mathbf{v}_j = (A^Tu_i)^T \mathbf{v}_j = \sqrt{\lambda_i} \mathbf{v}_i^T \mathbf{v}_j = \sqrt{\lambda_i} \delta_{ij}\\ \]

\[ \overline{U} = \begin{bmatrix} | & | & & | \\ u_1 & u_2 & \dots & u_k \\ | & | & & | \\ \end{bmatrix} \in \mathbb{R}^{m\times k} \\ \overline{V} = \begin{bmatrix} | & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_k \\ | & | & & | \\ \end{bmatrix} \in \mathbb{R}^{n\times k} \\ \overline{U}^TA\overline{V} = \overline{\Sigma},\\ \overline{\Sigma} = \text{diag}(\sqrt{\lambda_1}, \dots, \sqrt{\lambda_k}) \in \mathbb{R}^{k\times k} \]

Complete $\overline{U} \rightarrow U \in \mathbb{R}^{m\times m}$ and $\overline{V} \rightarrow V \in \mathbb{R}^{n\times n}$ by adding nullspace vectors (orthonormal columns).

Define $\Sigma \in \mathbb{R}^{m\times n}$ s.t.

\[ \Sigma_{ij} = \begin{cases} \sqrt{\lambda_i}=\sigma_i & i = j \leq k \\ 0 & \text{otherwise} \end{cases}\\ A = U\Sigma V^T \\ U^TU = I_{m\times m}, V^TV = I_{n\times n} \] $\Sigma$ is a diagonal matrix with nonnegative entries.

Geometry of SVD

\[ A = U\Sigma V^T \\ \]

Rotate ($V^T$)
Scale ($\Sigma$)
Rotate ($U$)

SVD Vocabulary

Left singular vectors: Columns of $U$; span $\text{col } A$
Right singular vectors: Columns of $V$; span $\text{row } A$
Singular values: Diagonal $\sigma_i$ of $\Sigma$; sort $\sigma_1 \geq \sigma_2 \geq \dots \geq 0$

Second Approach to SVD: Algebraic

\[ B:=\begin{bmatrix} 0 & A^T \\ A & 0 \end{bmatrix} \]

Proposition: Take $\mathbf{x} = (\mathbf{x}_1, \mathbf{x}_2) \in \mathbb{R}^{m+n}$ to be an eigenvector of $B$ with eigenvalue $\lambda$, where $\mathbf{x}_1 \in \mathbb{R}^m, \mathbf{x}_2 \in \mathbb{R}^n$. Then, $\mathbf{x}':=(\mathbf{x}_1, -\mathbf{x}_2)$ is an eigenvector of $B$ with eigenvalue $-\lambda$.

On the board: $A=U\Sigma V^T$ for diagonal $\Sigma$("economy" SVD)

Many SVD's

Full SVD: $U \in \mathbb{R}^{m\times m}, \Sigma \in \mathbb{R}^{m\times n}, V \in \mathbb{R}^{n\times n}$
Thin SVD: $U \in \mathbb{R}^{m\times k}, \Sigma \in \mathbb{R}^{k\times k}, V \in \mathbb{R}^{n\times k}$, where $k = \min\{m, n\}$
Compact SVD: $U \in \mathbb{R}^{m\times k}, \Sigma \in \mathbb{R}^{k\times k}, V \in \mathbb{R}^{n\times k}$, where $k = \text{rank } (A)$

Computing SVD: Two Strategies

Build from eigenvectors of $A^TA$ and $AA^T$
Build from eigenvectors of $\begin{bmatrix} 0 & A^T \\ A & 0 \end{bmatrix}$

$\exists$ more specialized methods!

Solving Linear Systems with $A=U\Sigma V^T$

\[ A\mathbf{x} = \mathbf{b}\\ \Rightarrow U\Sigma V^T \mathbf{x} = \mathbf{b}\\ \Rightarrow \mathbf{x} = V\Sigma^{-1} U^T \mathbf{b} \]

Uniting Short/Tall Matrices

\[ \min_{\mathbf{x}} ||A\mathbf{x} - \mathbf{b}||_2\rightarrow A^TA\mathbf{x} = A^T\mathbf{b}\\ \]

该约束可以转换为：

\[ \textrm{minimize } ||\mathbf{x}||_2^2 \quad \text{subject to } A^T A \mathbf{x} = A^T \mathbf{b} \]

原因是原问题有三种情况： 1. $A$可逆(completely determined)，有唯一解 2. $A$超定(overdetermined)，有最小二乘解 3. $A$欠定(underdetermined)，有无穷多解

任何情况都满足$A^TA\mathbf{x} = A^T\mathbf{b}$这个约束。

\[ A=U\Sigma V^T \Rightarrow A^TA \mathbf{x} = V\Sigma^T U^T U\Sigma V^T \mathbf{x} = V\Sigma^T \Sigma V^T \mathbf{x},\\ A^T \mathbf{b} = V\Sigma^T U^T \mathbf{b}\\ \therefore A\mathbf{x} = \mathbf{b} \Rightarrow \Sigma^T \Sigma V^T \mathbf{y} = \mathbf{d}, \mathbf{y}:= V^T \mathbf{x}, \mathbf{d}:= U^T \mathbf{b}\\ \]

可以转化为新的问题：

\[ \min ||\mathbf{y}||_2^2 \quad \text{s.t. } \Sigma^T \Sigma \mathbf{y} = \mathbf{d}\\ \rightarrow \sigma_i^2 y_i = d_i \quad \text{for } i = 1, \dots, k \Rightarrow y_i = \begin{cases} \frac{d_i}{\sigma_i} & \sigma_i \neq 0 \\ \mathbf{0} & \sigma_i = 0 \end{cases}\\ \Rightarrow \mathbf{y} = \Sigma^{+} \mathbf{d}, \Sigma^{+} = \text{diag}(\frac{1}{\sigma_i} \text{if } \sigma_i \neq 0, 0 \text{ otherwise})\\ \Rightarrow \mathbf{x} =V \mathbf{y} = V\Sigma^{+} U^T \mathbf{b} \]

Pseudoinverse

\[ A^+ = V\Sigma^{+} U^T \]

Properties

$A$ square and invertible $\Rightarrow A^+ = A^{-1}$
$A$ overdetermined $\Rightarrow A^+ \mathbf{b}$ gives least-squares solution to $A\mathbf{x} \approx \mathbf{b}$
$A$ underdetermined $\Rightarrow A^+ \mathbf{b}$ gives least-squares solution to $A\mathbf{x} \approx \mathbf{b}$ with least (Euclidean) norm

Alternative Form

\[ A=U\Sigma V^T \Rightarrow A = \sum_{i=1}^l \sigma_i \mathbf{u}_i \mathbf{v}_i^T\\ l:= \min\{m, n\}\\ A\mathbf{x} = \sum_{i=1}^l \sigma_i \mathbf{u}_i (\mathbf{v}_i^T \mathbf{x}) \approx \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T \mathbf{x} \]

Outer Product

\[ \mathbf{u} \otimes \mathbf{v} := \mathbf{u} \mathbf{v}^T \]

Computing $A\mathbf{x}$:

\[ A\mathbf{x} = \sum_{i=1} \sigma_i (\mathbf{v}_i \cdot \mathbf{x}) \mathbf{u}_i \]

Trick: Ignore small $\sigma_i$.

\[ A^+ = \sum_{\sigma_i \neq 0} \frac{\mathbf{v}_i \mathbf{u}_i^T}{\sigma_i} \]

Trick: Ignore large $\sigma_i$.

Eckart-Young Theorem

Suppose $\tilde{A}$ is obtained from $A=U\Sigma V^T$ by truncating all but the $k$ largest singular values $\sigma_i$ of $A$ to zero. Then, $\tilde{A}$ minimizes both $||A-\tilde{A}||_{\text{Fro}}$ and $||A-\tilde{A}||_2$ subject to the constant that the column space of $\tilde{A}$ has at most dimension $k$.

Proof:

\[ A=U\Sigma V^T, \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_l \]

Define $A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T, k \leq l$.

\[ \begin{aligned} ||A-A_k||_2 &= ||\sum_{i=k+1}^l \sigma_i \mathbf{u}_i \mathbf{v}_i^T||_2 = ||M||_2 = \sigma_{k+1} \end{aligned} \]

The last equality is because:

\[ \mathbf{x} = \mathbf{v}_{k+1} \Rightarrow M\mathbf{x} = \sum_{i=k+1}^l \sigma_i \mathbf{u}_i (\mathbf{v}_i^T \mathbf{v}_{k+1}) = \sigma_{k+1} \mathbf{u}_{k+1} \Rightarrow ||M||_2 = \sigma_{k+1} \]

Suppose $B = XY^T$ (B is rank $k$), where $X \in \mathbb{R}^{m\times k}, Y \in \mathbb{R}^{n\times k}$. Define $V_{k+1} = \begin{bmatrix} \mathbf{v}_1 & \dots & \mathbf{v}_{k+1} \end{bmatrix} \in \mathbb{R}^{n\times (k+1)}$. We know $Y^T V_{k+1}$ has a nullspace $\Rightarrow \exists \mathbf{z} \neq \mathbf{0}$ s.t. $Y^T V_{k+1} \mathbf{z} = \mathbf{0}$. Scale $\mathbf{z}$ s.t. $||\mathbf{z}||_2 = 1$.

\[ \begin{aligned} ||A-B||_2^2 &\geq \max_{||\mathbf{q}||_2 = 1} ||(A-B)\mathbf{q}||_2^2 \\ &\geq ||(A-B)V_{k+1} \mathbf{z}||_2^2 = ||(U\Sigma V^T - XY^T)V_{k+1} \mathbf{z}||_2^2 \\ &= ||U\Sigma V^T V_{k+1} \mathbf{z} - XY^T V_{k+1} \mathbf{z}||_2^2 \\ &= ||U\Sigma V^T V_{k+1} \mathbf{z}||\\ &= \sum_{i=1}^{k+1} z_i^2 \sigma_i^2 \geq \sigma_{k+1}^2 = ||A-A_k||_2^2 \end{aligned} \]

Matrix Norm Expressions

$$ ||A||_{}^2 = _i^2\ ||A||_2 = {_i}\ (A) =

SVD

https://grahamzen.github.io/2023/12/24/Applied Numerical Algorithms/SVD/

作者

Gehan Zheng

发布于

2023-12-24

更新于

2024-01-12

许可协议

#数学数值计算线性代数

SVD

Understading the Geometry of \(A\in \mathbb{R}^{m\times n}\)